Swahili Speech Dataset Development and Improved Pre-Training Method for Spoken Digit Recognition

نویسندگان

چکیده

Speech dataset is an essential component in building commercial speech applications. However, low-resource languages such as Swahili lack a resource that vital for spoken digit recognition. For where resources exist, they are usually insufficient. Thus, pre-training methods have been used with external to improve continuous the best of our knowledge, no study has investigated effect specifically This aimed at addressing these problems. First, we developed Then, cross-lingual and multi-lingual on Finally, proposed effective language-independent method The advantage incorporating target language data during stage leads optimal solution when using less training data. Experiments (being developed), English, Gujarati datasets show achieves better performance compared all baselines listed this study.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition

Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...

متن کامل

An Efficient Method for Removing Deletion Errors in Quickly-spoken Connected Mandarin Digit String Speech Recognition

Connected Mandarin digit string speech, especially at rapid spoken rate, is very difficult to recognize correctly. In this paper, a new training method named neighboring digits pattern is proposed in order to eliminate most of deletion errors which frequently occur in Mandarin digits speech recognition at high speaking rate when we have enough quickly-spoken speech data as the training set. The...

متن کامل

Improved Method of Handwritten Digit Recognition

MNIST database serves for comparison of different methods of handwritten digit recognition. There are many data related to different classifier recognition rates among which our neural classifier had the second place [1] (recognition rate 99.21%). At present we develop improvements of neural network structure and algorithms of handwritten digit recognition. Improved classifier has recognition r...

متن کامل

A New Step in Arabic Speech Identification: Spoken Digit Recognition

This work presents a new Algorithm to recognize separate voices of some Arabic words, the digits form zero to ten. Firstly we prepare our signal by pre-processing trial. Next the speech signal is processed as an image by Power Spectrum Estimation. For feature extraction, transformation and hence recognition, the algorithm of minimal eigenvalues of Toeplitz matrices together with other methods o...

متن کامل

Training data clustering for improved speech recognition

We present an approach to cluster the training data for automatic speech recognition (ASR). A relative-entropy based distance metric between training data clusters is deened. This metric is used to hierarchically cluster the training data. The metric can also be used to select the closest training data clusters given a small amount of data from the test speaker. The selected clusters are then u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Asian and Low-Resource Language Information Processing

سال: 2023

ISSN: ['2375-4699', '2375-4702']

DOI: https://doi.org/10.1145/3597494